Linear time series models for term weighting in information retrieval

نویسنده

  • Miles Efron
چکیده

Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this paper is the temporal behavior of terms as a collection changes over time. We propose capturing each term’s collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a term x at time t is predictable by a linear model of the term’s prior observations. On the other hand, a linear time series model for a strong discriminators’ collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel term weighting scheme based on discrimination power obtained from past retrieval results

Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particula...

متن کامل

Which Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?

Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...

متن کامل

The Effect of Term Importance Degree on Text Retrieval

Various approaches to index term-weighting have been investigated. In fact, term-weighting is an indispensable process for document ranking in most retrieval systems. As well actual information retrieval systems have to deal with explosive growth of documents of various sizes and terms of various frequencies because an appropriate term-weighting scheme has a crucial impact on the overall perfor...

متن کامل

Overview and Comparison of Short-term Interval Models for Financial Time Series Forecasting

  In recent years, various time series models have been proposed for financial markets forecasting. In each case, the accuracy of time series forecasting models are fundamental to make decision and hence the research for improving the effectiveness of forecasting models have been curried on. Many researchers have compared different time series models together in order to determine more efficien...

متن کامل

Global Statistics in Proximity Weighting Models

Information retrieval systems often use proximity or term dependence models to increase the effectiveness of document retrieval. Many of the existing proximity models examine document-level local statistics, such as the frequencies that pairs of query terms occur within fixed-size windows of each document, before applying standard or adapted weighting functions – for instance Markov Random Fiel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2010